fix: release lock before streaming and add kernel interrupt support#236
Draft
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
Draft
fix: release lock before streaming and add kernel interrupt support#236devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
Conversation
Fixes #213 — asyncio.Lock in messaging.py not released on client disconnect, causing cascading timeouts. Changes: - Narrow lock scope in ContextWebSocket.execute() to only cover the prepare+send phase (Phase A), releasing it before result streaming (Phase B). This prevents orphaned locks on client disconnect. - Schedule env var cleanup task under the lock (before release) to avoid the race condition flagged in PRs #234/#235. - Add POST /contexts/{id}/interrupt endpoint that calls Jupyter's kernel interrupt API, allowing clients to stop long-running code without restarting the kernel (preserves state). - Add interrupt_code_context/interruptCodeContext to Python and JS SDKs. Co-Authored-By: vasek <vasek.mlejnsky@gmail.com>
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #213 —
asyncio.Lockinmessaging.pynot released on client disconnect, causing cascading timeouts.Two changes:
Lock scope narrowing (
template/server/messaging.py): Splitsexecute()into Phase A (under lock: prepare env vars, send request, schedule cleanup task) and Phase B (no lock: stream results). When a client disconnects mid-stream, the lock is already released — no orphaned lock, no cascade. Addresses the env-var cleanup race flagged in PRs fix: release lock before streaming to prevent orphan on client disconnect #234/fix: release asyncio.Lock before streaming to prevent orphan on client disconnect #235 by creating_cleanup_taskwhile still holding the lock.Kernel interrupt endpoint (
template/server/main.py, Python SDK, JS SDK): AddsPOST /contexts/{id}/interruptthat proxies to Jupyter's kernel interrupt API. Exposed asinterrupt_code_context()/interruptCodeContext()in SDKs. Allows stopping long-running code without restarting the kernel (state preserved).Review & Testing Checklist for Human
websockets==12.0: After the lock is released,_cleanup_env_varsmay callself._ws.send()concurrently with the nextexecute()'sself._ws.send()(under lock). The assumption iswebsockets12.x handles this internally — confirm this or add explicit serialization.await-ed at the start of the nextexecute(). If the kernel is still busy with the previous code, this await blocks. Confirm this is acceptable behavior vs. the old cascading-lock behavior.sleep(30)with a 5s SDK timeout, then immediately runprint('hello')on the same context. The second call should not be blocked by an orphaned lock.interrupt_code_contexton a long-running execution, verify variables/imports from prior cells survive.interrupt_code_contextdoesn't sendX-Access-Token— this matches the existing asyncrestart_code_contextpattern, but differs from the sync version which does send it. Pre-existing inconsistency, but worth noting.Notes
fix/lock-orphan-on-disconnect). The key difference is scheduling_cleanup_taskunder the lock to prevent the env-var isolation regression that Codex flagged.finallyblock in Phase B usesself._executions.pop(message_id, None)for defensive cleanup if the generator is abandoned by Starlette on client disconnect.Link to Devin session: https://app.devin.ai/sessions/d709ebe9b3e14cea89be89c9c2faa29e
Requested by: @mlejva